FP-Growth Algorithm

FP-Growth (Frequent Pattern Growth) is an efficient and scalable method for mining the complete set of frequent patterns in a database. It improves upon the Apriori algorithm by eliminating the need to generate candidate itemsets, resulting in significantly better performance.

Core Components

1. FP-Tree (Frequent Pattern Tree)

A compact data structure that stores quantitative information about frequent patterns
Each path represents a set of transactions that share the same frequent items
Each node contains:
- Item name
- Count (frequency of the item)
- Node-link (pointer to the next node with the same item)

2. Header Table

Contains each frequent item
Count of occurrences
Pointer to the first occurrence of the item in the FP-tree

Algorithm Steps

Phase 1: FP-Tree Construction

Scan the database once to identify frequent items and their support counts
Sort frequent items in each transaction by descending support count
Construct the FP-tree:
- Create the root node labeled "null"
- For each transaction, insert its sorted frequent items into the tree
- If a path already exists, increment the count of nodes along the path
- Otherwise, create new nodes with count 1

Phase 2: Frequent Pattern Extraction

Mine patterns from the FP-tree using a recursive divide-and-conquer approach
For each item in the header table (from bottom up):
- Generate a frequent pattern with this item
- Construct its conditional pattern base (sub-paths leading to this item)
- Construct its conditional FP-tree
- Recursively mine the conditional FP-tree
- Terminate when the conditional FP-tree is empty

Example

Consider this transaction dataset:

Transaction ID	Items Purchased
1	f, a, c, d, g, i, m, p
2	a, b, c, f, l, m, o
3	b, f, h, j, o
4	b, c, k, s, p
5	a, f, c, e, l, p, m, n

With minimum support = 3 (items must appear in at least 3 transactions):

Step 1: Find frequent 1-itemsets

Frequent items: a:3, b:3, c:4, f:4, m:3, p:3 (sorted by frequency: c, f, a, b, m, p)

Step 2: Order items by frequency in each transaction

f, c, a, m, p
f, c, a, b, m
f, b
c, b, p
f, c, a, m, p

Step 3: Build FP-Tree

Create paths for each transaction
Merge overlapping paths and increment counters

Step 4: Mine frequent patterns

Start from the least frequent item (p) in the header table
Follow node-links to find all paths containing p
Extract conditional pattern bases
Build conditional FP-tree for p
Recursively mine patterns

Advantages over Apriori

No candidate generation: Eliminates the costly process of candidate generation and testing
Compressed representation: FP-tree provides a compact representation of the transaction database
No repeated database scans: Scans the database only twice (vs. multiple scans in Apriori)
"Divide and conquer": Breaks down the problem into smaller subproblems
Much faster: Typically 3-4 times faster than Apriori, especially for large datasets

Limitations of FP-Growth

Memory intensive: The entire FP-tree needs to fit in memory
Complex implementation: More difficult to implement than Apriori
Less intuitive: Harder to understand conceptually than Apriori

Variations and Improvements

Parallel FP-Growth: Distributed versions for handling very large datasets
CT-PRO (Compressed FP-tree): More memory-efficient variant
COFI-tree: An optimized approach for mining specific items
CanTree: A canonical tree structure that doesn't require pre-sorting

Applications

E-commerce recommendation systems
Web usage mining
Bioinformatics (finding patterns in biological sequences)
Intrusion detection in cybersecurity
Cross-selling strategies in retail
Document clustering and classification

FP-Growth Algorithm

Core Components​

1. FP-Tree (Frequent Pattern Tree)​

2. Header Table​

Algorithm Steps​

Phase 1: FP-Tree Construction​

Phase 2: Frequent Pattern Extraction​

Example​

Step 1: Find frequent 1-itemsets​

Step 2: Order items by frequency in each transaction​

Step 3: Build FP-Tree​

Step 4: Mine frequent patterns​

Advantages over Apriori​

Limitations of FP-Growth​

Variations and Improvements​

Applications​